In this notebook we will perform the gene-peak links filtering by deferentially accessible regions (DARs) and, then, by the differentially expressed genes (DEGs). We need to remember that we are only analyzing BCL6 and PRDM1 genes links as proof of concepts and to reduce the computational time. Also, you will see how the number or link significantly decrease and how they look at the coverage plot.
In this report we will continue working with the previous Seurat object saved as 4.tonsil_bcell_linkpeaks.rdsin the results/R_object directory.
BiocManager::install("Repitools")
knitr::opts_chunk$set(echo = TRUE)
library(Signac)
library(Seurat)
library(ggplot2)
library(ggpubr)
library(tidyverse)
library(EnsDb.Hsapiens.v86)
library(plyr)
library(reshape2)
library(data.table)
library(GenomicRanges)
library(harmony)
library(hdf5r)
library(stringr)
library(ggpubr)
library(RColorBrewer)
library(magick)
library(knitr)
library(biovizBase)
library(patchwork)
library(Repitools)
library(qlcMatrix)
library(kableExtra)
library(BSgenome.Hsapiens.UCSC.hg38)
set.seed(123)
# Paths
path_to_objects <- here::here("results/R_objects/")
path_to_tables <- here::here("results/tables/")
tonsil_bcell <- readRDS(paste0(path_to_objects,"4.tonsil_bcell_linkpeaks.rds"))
deg<- read_csv(paste0(path_to_tables,"tonsil_bcell_deg.csv"))
## New names:
## Rows: 3696 Columns: 8
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (3): ...1, cluster, gene dbl (5): p_val, avg_log2FC, pct.1, pct.2, p_val_adj
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
dars<- read_csv(paste0(path_to_tables,"tonsil_bcell_dars.csv"))
## New names:
## Rows: 25861 Columns: 8
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (3): ...1, cluster, gene dbl (5): p_val, avg_log2FC, pct.1, pct.2, p_val_adj
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
Idents(tonsil_bcell)<-"annotation_level_1"
DimPlot(tonsil_bcell,label = TRUE, reduction = "wnn.umap", pt.size = 0.5,cols = c("#a6cee3", "#1f78b4","#b2df8a", "#fb9a99","#e31a1c")) + ggtitle("Joint UMAP")+NoLegend()
First we will see how many DARs are in each main B-cell population.
dars_nbc<-dars[dars$cluster=="NBC",]
dars_gcbc<-dars[dars$cluster=="GCBC",]
dars_mbc<-dars[dars$cluster=="MBC",]
dars_pc<-dars[dars$cluster=="PC",]
df_dars_celltype<-data.frame(cell_type=c("NBC","GCBC","MBC","PC"), num_dars=c(nrow(dars_nbc),nrow(dars_gcbc),nrow(dars_mbc),nrow(dars_pc)))
df_dars_celltype
## cell_type num_dars
## 1 NBC 384
## 2 GCBC 13494
## 3 MBC 1075
## 4 PC 10908
We can see that the GCBC and the PC are the one with more DARs that makes sense because GCBC undergo affinity maturation, multiply rapidly, and differentiate into memory B cells (MBC). The chromatin of NBC, since it has not been exposed to any antigen, is not differentiated for any specific function. MBC recovers the genome architecture of NBC after GC reaction. Therefore, MBC is more similar to NBC and is distinctly different from GCBC and PC.
The gene column actually corresponds to the DARs coordinates. So here we will split it into 3 columns in order to have one column for chromosome position, the start and the end coordinates of each peak. Doing so, then, we will be able to create a Granges.
coord_link_DARs<-str_split_fixed(dars$gene, "-", 3)
colnames(coord_link_DARs)<-c("chrom","start","end")
dars<-cbind(dars,coord_link_DARs)
#write_csv(dars,path_to_save_DARs)
In this dinamyc table we can see the 3 new columns.
DT::datatable(head(dars))
#remove column ...1 because is the same than the "gene" column
dars$...1<-NULL
From dataframe to Granges
First we have the change the name of “genes” column by “peaks”. and the “chrom” name as “seqnames” to match it with Links dataframe.
names(dars)[c(7,8)]
## [1] "gene" "chrom"
names(dars)[c(7,8)]<-c("peak","seqnames")
Now we will save the links in a data frame in order to join it with the dars dataframe.
#Joint links to DARs
df_linkpeak<-as.data.frame(Links(tonsil_bcell))
kbl(head(df_linkpeak),caption = "Table of Link peaks of level 1 tonsil object") %>%
kable_paper("striped", full_width = F)
| seqnames | start | end | width | strand | score | gene | peak | zscore | pvalue |
|---|---|---|---|---|---|---|---|---|---|
| chr3 | 278448 | 187745727 | 187467280 |
|
0.1024074 | BCL6 | chr3-277841-279055 | 1.883377 | 0.0298247 |
| chr3 | 1381200 | 187745727 | 186364528 |
|
0.0896937 | BCL6 | chr3-1380484-1381916 | 1.682024 | 0.0462821 |
| chr3 | 3852022 | 187745727 | 183893706 |
|
-0.0505353 | BCL6 | chr3-3851740-3852303 | -2.356752 | 0.0092178 |
| chr3 | 4071584 | 187745727 | 183674144 |
|
-0.0705619 | BCL6 | chr3-4071215-4071952 | -3.118217 | 0.0009097 |
| chr3 | 4493657 | 187745727 | 183252071 |
|
0.1073325 | BCL6 | chr3-4492184-4495129 | 1.880251 | 0.0300369 |
| chr3 | 4503337 | 187745727 | 183242391 |
|
0.1457728 | BCL6 | chr3-4502886-4503787 | 2.618903 | 0.0044106 |
Here we are going to join both data frame by the inner_join function which will keep only the links that are overlapping with the DARs.
df_links_DARs<-inner_join(df_linkpeak,dars,by="peak")
#change the "seqnames.x" "start.x" "end.x" "seqnames.y" "start.y" by "seqnames","start","end","start.peak","end.peak"
names(df_links_DARs)[c(1,2,3,18,19)]<-c("seqnames","start","end","start.peak","end.peak")
kbl(head(df_links_DARs),caption = "Table of gene-peak links filtered by DARs") %>%
kable_paper("striped", full_width = F)
| seqnames | start | end | width | strand | score | gene | peak | zscore | pvalue | p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj | cluster | seqnames.y | start.peak | end.peak |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| chr3 | 278448 | 187745727 | 187467280 |
|
0.1024074 | BCL6 | chr3-277841-279055 | 1.883377 | 0.0298247 | 0.0e+00 | 0.7547529 | 0.234 | 0.112 | 0.00e+00 | GCBC | chr3 | 277841 | 279055 |
| chr3 | 1381200 | 187745727 | 186364528 |
|
0.0896937 | BCL6 | chr3-1380484-1381916 | 1.682024 | 0.0462821 | 0.0e+00 | 1.4056043 | 0.180 | 0.059 | 0.00e+00 | GCBC | chr3 | 1380484 | 1381916 |
| chr3 | 4503337 | 187745727 | 183242391 |
|
0.1457728 | BCL6 | chr3-4502886-4503787 | 2.618903 | 0.0044106 | 0.0e+00 | 0.8760536 | 0.322 | 0.143 | 0.00e+00 | GCBC | chr3 | 4502886 | 4503787 |
| chr3 | 4831286 | 187745727 | 182914442 |
|
-0.0603318 | BCL6 | chr3-4830884-4831688 | -2.207661 | 0.0136339 | 0.0e+00 | 0.8039113 | 0.127 | 0.082 | 2.24e-05 | MBC | chr3 | 4830884 | 4831688 |
| chr3 | 4831286 | 187745727 | 182914442 |
|
-0.0603318 | BCL6 | chr3-4830884-4831688 | -2.207661 | 0.0136339 | 2.2e-05 | 0.7131709 | 0.156 | 0.088 | 1.00e+00 | PC | chr3 | 4830884 | 4831688 |
| chr3 | 4978824 | 187745727 | 182766904 |
|
-0.1552091 | BCL6 | chr3-4978078-4979570 | -5.314769 | 0.0000001 | 0.0e+00 | 1.0301874 | 0.361 | 0.235 | 0.00e+00 | NBC | chr3 | 4978078 | 4979570 |
# remove the seqnames.y column becuase its the same as the seqnames column
df_links_DARs$seqnames.y<-NULL
Dataframe of the links filteres by DARs
kbl(head(df_links_DARs),caption = "Table of gene-peak links filtered by DARs") %>%
kable_paper("striped", full_width = F)
| seqnames | start | end | width | strand | score | gene | peak | zscore | pvalue | p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj | cluster | start.peak | end.peak |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| chr3 | 278448 | 187745727 | 187467280 |
|
0.1024074 | BCL6 | chr3-277841-279055 | 1.883377 | 0.0298247 | 0.0e+00 | 0.7547529 | 0.234 | 0.112 | 0.00e+00 | GCBC | 277841 | 279055 |
| chr3 | 1381200 | 187745727 | 186364528 |
|
0.0896937 | BCL6 | chr3-1380484-1381916 | 1.682024 | 0.0462821 | 0.0e+00 | 1.4056043 | 0.180 | 0.059 | 0.00e+00 | GCBC | 1380484 | 1381916 |
| chr3 | 4503337 | 187745727 | 183242391 |
|
0.1457728 | BCL6 | chr3-4502886-4503787 | 2.618903 | 0.0044106 | 0.0e+00 | 0.8760536 | 0.322 | 0.143 | 0.00e+00 | GCBC | 4502886 | 4503787 |
| chr3 | 4831286 | 187745727 | 182914442 |
|
-0.0603318 | BCL6 | chr3-4830884-4831688 | -2.207661 | 0.0136339 | 0.0e+00 | 0.8039113 | 0.127 | 0.082 | 2.24e-05 | MBC | 4830884 | 4831688 |
| chr3 | 4831286 | 187745727 | 182914442 |
|
-0.0603318 | BCL6 | chr3-4830884-4831688 | -2.207661 | 0.0136339 | 2.2e-05 | 0.7131709 | 0.156 | 0.088 | 1.00e+00 | PC | 4830884 | 4831688 |
| chr3 | 4978824 | 187745727 | 182766904 |
|
-0.1552091 | BCL6 | chr3-4978078-4979570 | -5.314769 | 0.0000001 | 0.0e+00 | 1.0301874 | 0.361 | 0.235 | 0.00e+00 | NBC | 4978078 | 4979570 |
#save the df link filtered by DARs
write_csv(df_links_DARs,paste0(path_to_tables,"5.Links_filtered_by_DARs.csv"))
Since the Links data is in GRanges format we have to save the new set of link in this format. To do so, the df_links_DARs dataframe needs to become in a GRanges format using the makeGRangesFromDataFrame function.
To learn more about GRanges format you can check this tutorial
tonsil_bcell_links_DARs<-tonsil_bcell
Links(tonsil_bcell_links_DARs)<-makeGRangesFromDataFrame(df_links_DARs,
keep.extra.columns=FALSE,
ignore.strand=FALSE,
seqinfo=NULL,
seqnames.field=c("seqnames", "seqname","chromosome",
"chrom","chr","chromosome_name","seqid"),
start.field="start",
end.field=c("end", "stop"),
strand.field="strand",
starts.in.df.are.0based=FALSE)
values(Links(tonsil_bcell_links_DARs))<-DataFrame(score=df_links_DARs$score,
gene=df_links_DARs$gene,
cluster=df_links_DARs$cluster,
peak=df_links_DARs$peak,
zscore=df_links_DARs$zscore,
pvalue=df_links_DARs$pvalue,
p_val.dars=df_links_DARs$p_val,
pct.1=df_links_DARs$pct.1,
pct.2=df_links_DARs$pct.2,
avg_log2FC=df_links_DARs$avg_log2FC,
start.peak=df_links_DARs$start.peak,
end.peak=df_links_DARs$end.peak)
coverage_extend{r}is a function that create a coverageplot for a certain gene and region using different upstream and downstream extension region.
coverage_extend <- function(x,y,seuratobject){purrr::map(y, function(y) {
p <- CoveragePlot(
object = seuratobject,
region = x,
features = x,
expression.assay = "RNA",
idents = idents.plot,
extend.upstream = y,
extend.downstream = y
#tile = TRUE
)
p & scale_fill_manual(values = cols_cluster)
})}
ranges.show <- StringToGRanges("chr3-187721377-187745725")
# set the colors will have each chromatin accesibility profile of each cell type. That match with the colors in the UMAP
cols_cluster <- c("#a6cee3", "#1f78b4","#b2df8a", "#fb9a99")
idents.plot <- c("GCBC", "NBC", "MBC", "PC")
We will see the link at different level: - Gene body level: 0 bp from TSS - Promoter level: 2000 from TSS - Long genomic distances from gene body: 10000,1e+7 bp from TSS
coverage_extend("BCL6",c(0,2000,10000,1e+7),tonsil_bcell)
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## [[1]]
## Warning: Removed 1 rows containing missing values (`geom_segment()`).
## Removed 1 rows containing missing values (`geom_segment()`).
##
## [[2]]
##
## [[3]]
##
## [[4]]
## Warning: Removed 71 rows containing missing values (`geom_segment()`).
## Removed 1 rows containing missing values (`geom_segment()`).
## Removed 1 rows containing missing values (`geom_segment()`).
coverage_extend("PRDM1",c(0,2000,10000,1e+7),tonsil_bcell)
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## [[1]]
## Warning: Removed 55 rows containing missing values (`geom_segment()`).
## Warning: Removed 1 rows containing missing values (`geom_segment()`).
## Removed 1 rows containing missing values (`geom_segment()`).
##
## [[2]]
## Warning: Removed 55 rows containing missing values (`geom_segment()`).
## Removed 1 rows containing missing values (`geom_segment()`).
##
## [[3]]
## Warning: Removed 54 rows containing missing values (`geom_segment()`).
## Removed 1 rows containing missing values (`geom_segment()`).
##
## [[4]]
## Warning: Removed 27 rows containing missing values (`geom_segment()`).
## Removed 1 rows containing missing values (`geom_segment()`).
coverage_extend("BCL6",c(0,2000,10000,1e+7),tonsil_bcell_links_DARs)
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## [[1]]
## Warning: Removed 1 rows containing missing values (`geom_segment()`).
## Removed 1 rows containing missing values (`geom_segment()`).
##
## [[2]]
##
## [[3]]
##
## [[4]]
## Warning: Removed 71 rows containing missing values (`geom_segment()`).
## Removed 1 rows containing missing values (`geom_segment()`).
## Removed 1 rows containing missing values (`geom_segment()`).
coverage_extend("PRDM1",c(0,2000,10000,100000,1e+6),tonsil_bcell_links_DARs)
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## [[1]]
## Warning: Removed 55 rows containing missing values (`geom_segment()`).
## Warning: Removed 1 rows containing missing values (`geom_segment()`).
## Removed 1 rows containing missing values (`geom_segment()`).
##
## [[2]]
## Warning: Removed 55 rows containing missing values (`geom_segment()`).
## Removed 1 rows containing missing values (`geom_segment()`).
##
## [[3]]
## Warning: Removed 54 rows containing missing values (`geom_segment()`).
## Removed 1 rows containing missing values (`geom_segment()`).
##
## [[4]]
## Warning: Removed 40 rows containing missing values (`geom_segment()`).
##
## [[5]]
## Warning: Removed 2 rows containing missing values (`geom_segment()`).
Now, the already filtered Links by DARs are going to be filtered by DEG.
deg_nbc<-deg[deg$cluster=="NBC",]
deg_gcbc<-deg[deg$cluster=="GCBC",]
deg_mbc<-deg[deg$cluster=="MBC",]
deg_pc<-deg[deg$cluster=="PC",]
df_deg_celltype<-data.frame(cell_type=c("NBC","GCBC","MBC","PC"), num_deg=c(nrow(deg_nbc),nrow(deg_gcbc),nrow(deg_mbc),nrow(deg_pc)))
df_deg_celltype
## cell_type num_deg
## 1 NBC 705
## 2 GCBC 1563
## 3 MBC 804
## 4 PC 624
kbl(head(deg),caption = "Table of DEG markers") %>%
kable_paper("striped", full_width = F)
| …1 | p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj | cluster | gene |
|---|---|---|---|---|---|---|---|
| AC023590.1 | 0 | 3.401152 | 0.977 | 0.169 | 0 | GCBC | AC023590.1 |
| RAPGEF5 | 0 | 2.915248 | 0.932 | 0.125 | 0 | GCBC | RAPGEF5 |
| MAML3 | 0 | 2.798914 | 0.820 | 0.133 | 0 | GCBC | MAML3 |
| MYBL1 | 0 | 2.785654 | 0.857 | 0.048 | 0 | GCBC | MYBL1 |
| AFF2 | 0 | 2.765358 | 0.944 | 0.165 | 0 | GCBC | AFF2 |
| LPP | 0 | 2.746521 | 0.984 | 0.735 | 0 | GCBC | LPP |
#remove ..1 column becuase is the same than "gene" column
deg$...1<-NULL
df_links_DARs_DEG<-inner_join(df_links_DARs,deg,by=c("gene","cluster"))
head(df_links_DARs_DEG)
## seqnames start end width strand score gene
## 1 chr3 278448 187745727 187467280 * 0.10240736 BCL6
## 2 chr3 1381200 187745727 186364528 * 0.08969371 BCL6
## 3 chr3 4503337 187745727 183242391 * 0.14577278 BCL6
## 4 chr3 5181772 187745727 182563956 * 0.12040973 BCL6
## 5 chr3 8501626 187745727 179244102 * 0.11310603 BCL6
## 6 chr3 9653382 187745727 178092346 * 0.14435103 BCL6
## peak zscore pvalue p_val.x avg_log2FC.x pct.1.x
## 1 chr3-277841-279055 1.883377 0.029824678 4.549574e-44 0.7547529 0.234
## 2 chr3-1380484-1381916 1.682024 0.046282074 1.904554e-66 1.4056043 0.180
## 3 chr3-4502886-4503787 2.618903 0.004410650 1.328958e-74 0.8760536 0.322
## 4 chr3-5181373-5182171 2.096884 0.018001924 1.563924e-62 0.8788428 0.271
## 5 chr3-8501293-8501959 1.679920 0.046486419 1.203616e-69 0.9265237 0.288
## 6 chr3-9652559-9654205 2.609895 0.004528503 1.386127e-126 1.3792834 0.316
## pct.2.x p_val_adj.x cluster start.peak end.peak p_val.y avg_log2FC.y
## 1 0.112 5.950752e-39 GCBC 277841 279055 0 1.382211
## 2 0.059 2.491119e-61 GCBC 1380484 1381916 0 1.382211
## 3 0.143 1.738251e-69 GCBC 4502886 4503787 0 1.382211
## 4 0.119 2.045581e-57 GCBC 5181373 5182171 0 1.382211
## 5 0.124 1.574306e-64 GCBC 8501293 8501959 0 1.382211
## 6 0.100 1.813027e-121 GCBC 9652559 9654205 0 1.382211
## pct.1.y pct.2.y p_val_adj.y
## 1 0.758 0.096 0
## 2 0.758 0.096 0
## 3 0.758 0.096 0
## 4 0.758 0.096 0
## 5 0.758 0.096 0
## 6 0.758 0.096 0
names(df_links_DARs_DEG)[c(11,12,13,14,15,19,20,21,22,23)]
## [1] "p_val.x" "avg_log2FC.x" "pct.1.x" "pct.2.x" "p_val_adj.x"
## [6] "p_val.y" "avg_log2FC.y" "pct.1.y" "pct.2.y" "p_val_adj.y"
we need to change these column names: x = dars and y=deg
#
names(df_links_DARs_DEG)[c(11,12,13,14,15,19,20,21,22,23)]<-c("p_val.dars","avg_log2FC.dars","pct.1.dars" ,"pct.2.dars" ,"p_val_adj.dars","p_val.deg", "avg_log2FC.deg","pct.1.deg" ,"pct.2.deg", "p_val_adj.deg")
head(df_links_DARs_DEG)
## seqnames start end width strand score gene
## 1 chr3 278448 187745727 187467280 * 0.10240736 BCL6
## 2 chr3 1381200 187745727 186364528 * 0.08969371 BCL6
## 3 chr3 4503337 187745727 183242391 * 0.14577278 BCL6
## 4 chr3 5181772 187745727 182563956 * 0.12040973 BCL6
## 5 chr3 8501626 187745727 179244102 * 0.11310603 BCL6
## 6 chr3 9653382 187745727 178092346 * 0.14435103 BCL6
## peak zscore pvalue p_val.dars avg_log2FC.dars
## 1 chr3-277841-279055 1.883377 0.029824678 4.549574e-44 0.7547529
## 2 chr3-1380484-1381916 1.682024 0.046282074 1.904554e-66 1.4056043
## 3 chr3-4502886-4503787 2.618903 0.004410650 1.328958e-74 0.8760536
## 4 chr3-5181373-5182171 2.096884 0.018001924 1.563924e-62 0.8788428
## 5 chr3-8501293-8501959 1.679920 0.046486419 1.203616e-69 0.9265237
## 6 chr3-9652559-9654205 2.609895 0.004528503 1.386127e-126 1.3792834
## pct.1.dars pct.2.dars p_val_adj.dars cluster start.peak end.peak p_val.deg
## 1 0.234 0.112 5.950752e-39 GCBC 277841 279055 0
## 2 0.180 0.059 2.491119e-61 GCBC 1380484 1381916 0
## 3 0.322 0.143 1.738251e-69 GCBC 4502886 4503787 0
## 4 0.271 0.119 2.045581e-57 GCBC 5181373 5182171 0
## 5 0.288 0.124 1.574306e-64 GCBC 8501293 8501959 0
## 6 0.316 0.100 1.813027e-121 GCBC 9652559 9654205 0
## avg_log2FC.deg pct.1.deg pct.2.deg p_val_adj.deg
## 1 1.382211 0.758 0.096 0
## 2 1.382211 0.758 0.096 0
## 3 1.382211 0.758 0.096 0
## 4 1.382211 0.758 0.096 0
## 5 1.382211 0.758 0.096 0
## 6 1.382211 0.758 0.096 0
#write_csv(df_links_DARs_DEG,path_to_save_df_join_link_DARs_DEG)
kbl(head(df_links_DARs_DEG),caption = "Table of the filtered links by DARs and DEG") %>%
kable_paper("striped", full_width = F)
| seqnames | start | end | width | strand | score | gene | peak | zscore | pvalue | p_val.dars | avg_log2FC.dars | pct.1.dars | pct.2.dars | p_val_adj.dars | cluster | start.peak | end.peak | p_val.deg | avg_log2FC.deg | pct.1.deg | pct.2.deg | p_val_adj.deg |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| chr3 | 278448 | 187745727 | 187467280 |
|
0.1024074 | BCL6 | chr3-277841-279055 | 1.883377 | 0.0298247 | 0 | 0.7547529 | 0.234 | 0.112 | 0 | GCBC | 277841 | 279055 | 0 | 1.382211 | 0.758 | 0.096 | 0 |
| chr3 | 1381200 | 187745727 | 186364528 |
|
0.0896937 | BCL6 | chr3-1380484-1381916 | 1.682024 | 0.0462821 | 0 | 1.4056043 | 0.180 | 0.059 | 0 | GCBC | 1380484 | 1381916 | 0 | 1.382211 | 0.758 | 0.096 | 0 |
| chr3 | 4503337 | 187745727 | 183242391 |
|
0.1457728 | BCL6 | chr3-4502886-4503787 | 2.618903 | 0.0044106 | 0 | 0.8760536 | 0.322 | 0.143 | 0 | GCBC | 4502886 | 4503787 | 0 | 1.382211 | 0.758 | 0.096 | 0 |
| chr3 | 5181772 | 187745727 | 182563956 |
|
0.1204097 | BCL6 | chr3-5181373-5182171 | 2.096884 | 0.0180019 | 0 | 0.8788428 | 0.271 | 0.119 | 0 | GCBC | 5181373 | 5182171 | 0 | 1.382211 | 0.758 | 0.096 | 0 |
| chr3 | 8501626 | 187745727 | 179244102 |
|
0.1131060 | BCL6 | chr3-8501293-8501959 | 1.679920 | 0.0464864 | 0 | 0.9265237 | 0.288 | 0.124 | 0 | GCBC | 8501293 | 8501959 | 0 | 1.382211 | 0.758 | 0.096 | 0 |
| chr3 | 9653382 | 187745727 | 178092346 |
|
0.1443510 | BCL6 | chr3-9652559-9654205 | 2.609895 | 0.0045285 | 0 | 1.3792834 | 0.316 | 0.100 | 0 | GCBC | 9652559 | 9654205 | 0 | 1.382211 | 0.758 | 0.096 | 0 |
tonsil_bcell_links_DARs_DEG<-tonsil_bcell
#Create the new link GRanges data into the new seurat object.
Links(tonsil_bcell_links_DARs_DEG)<-makeGRangesFromDataFrame(df_links_DARs_DEG,
keep.extra.columns=FALSE,
ignore.strand=FALSE,
seqinfo=NULL,
seqnames.field=c("seqnames", "seqname","chromosome",
"chrom","chr","chromosome_name","seqid"),
start.field="start",
end.field=c("end", "stop"),
strand.field="strand",
starts.in.df.are.0based=FALSE)
# Add the values of each link
values(Links(tonsil_bcell_links_DARs_DEG))<-DataFrame(score=df_links_DARs_DEG$score,
gene=df_links_DARs_DEG$gene,
cluster=df_links_DARs_DEG$cluster,
peak=df_links_DARs_DEG$peak,
zscore=df_links_DARs_DEG$zscore,
pvalue=df_links_DARs_DEG$pvalue,
p_val.dars=df_links_DARs_DEG$p_val.dars,
pct.1.dars=df_links_DARs_DEG$pct.1.dars,
pct.2.dars=df_links_DARs_DEG$pct.2.dars,
avg_log2FC.dars=df_links_DARs_DEG$avg_log2FC.dars,
start.peak=df_links_DARs_DEG$start.peak,
end.peak=df_links_DARs_DEG$end.peak,
p_val.deg=df_links_DARs_DEG$p_val.deg,
avg_log2FC.deg=df_links_DARs_DEG$avg_log2FC.deg,
pct.1.deg=df_links_DARs_DEG$pct.1.deg,
pct.2.deg=df_links_DARs_DEG$pct.2.deg,
p_val_adj.deg=df_links_DARs_DEG$p_val_adj.deg)
table(df_links_DARs_DEG$gene=="BCL6",df_links_DARs_DEG$cluster)
##
## GCBC PC
## FALSE 0 283
## TRUE 394 0
As we can see we were able to keep the links that are correlating each DEGs with the DARs. Since BCL6 is DEG in GCBC we are only keeping the links that correlated BCL6 within the DARs. The same happen with PRDM1 but in PC cluster.
table(df_links_DARs_DEG$gene=="PRDM1",df_links_DARs_DEG$cluster)
##
## GCBC PC
## FALSE 394 0
## TRUE 0 283
ranges.show <- StringToGRanges("chr3-187721377-187745725")
cols_cluster <- c("#a6cee3", "#1f78b4","#b2df8a", "#fb9a99")
idents.plot <- c("GCBC", "NBC", "MBC", "PC")
coverage_extend("BCL6",c(0,2000,10000,1e+7),tonsil_bcell_links_DARs_DEG)
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
coverage_extend("PRDM1",c(0,2000,10000,1e+7),tonsil_bcell_links_DARs_DEG)
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
table(df_linkpeak$gene=="BCL6")
##
## FALSE TRUE
## 584 721
table(df_links_DARs_DEG$gene=="BCL6")
##
## FALSE TRUE
## 283 394
print(paste0("Initial number of links: ", nrow(df_linkpeak)))
## [1] "Initial number of links: 1305"
print(paste0("Number of links filtered by DARs: ", nrow(df_links_DARs)))
## [1] "Number of links filtered by DARs: 860"
print(paste0("Number of links filtered by DARs and DEG: ", nrow(df_links_DARs_DEG)))
## [1] "Number of links filtered by DARs and DEG: 677"
df_data_steps_links<-data.frame(Ini_peaks=nrow(df_linkpeak),peaks_DARs=nrow(df_links_DARs),peaks_DARs_DEA=nrow(df_links_DARs_DEG))
|
Filtered by
|
||
|---|---|---|
| Ini_peaks | peaks_DARs | peaks_DARs_DEA |
| 1305 | 860 | 677 |
table(df_linkpeak$gene=="BCL6")
##
## FALSE TRUE
## 584 721
table(df_links_DARs$gene=="BCL6",df_links_DARs$cluster)
##
## GCBC MBC NBC PC
## FALSE 74 5 0 283
## TRUE 394 10 4 90
table(df_links_DARs_DEG$gene=="BCL6",df_links_DARs_DEG$cluster)
##
## GCBC PC
## FALSE 0 283
## TRUE 394 0
table(df_linkpeak$gene=="PRDM1")
##
## FALSE TRUE
## 721 584
table(df_links_DARs$gene=="PRDM1",df_links_DARs$cluster)
##
## GCBC MBC NBC PC
## FALSE 394 10 4 90
## TRUE 74 5 0 283
table(df_links_DARs_DEG$gene=="PRDM1",df_links_DARs_DEG$cluster)
##
## GCBC PC
## FALSE 394 0
## TRUE 0 283
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] es_ES.UTF-8/es_ES.UTF-8/es_ES.UTF-8/C/es_ES.UTF-8/es_ES.UTF-8
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] BSgenome.Hsapiens.UCSC.hg38_1.4.4 BSgenome_1.64.0
## [3] rtracklayer_1.56.1 Biostrings_2.66.0
## [5] XVector_0.38.0 kableExtra_1.3.4
## [7] qlcMatrix_0.9.7 sparsesvd_0.2-1
## [9] slam_0.1-50 Matrix_1.5-3
## [11] Repitools_1.42.0 patchwork_1.1.2
## [13] biovizBase_1.44.0 knitr_1.40
## [15] magick_2.7.3 RColorBrewer_1.1-3
## [17] hdf5r_1.3.7 harmony_0.1.1
## [19] Rcpp_1.0.9 data.table_1.14.4
## [21] reshape2_1.4.4 plyr_1.8.8
## [23] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.20.2
## [25] AnnotationFilter_1.20.0 GenomicFeatures_1.48.4
## [27] AnnotationDbi_1.60.0 Biobase_2.58.0
## [29] GenomicRanges_1.50.1 GenomeInfoDb_1.34.2
## [31] IRanges_2.32.0 S4Vectors_0.36.0
## [33] BiocGenerics_0.44.0 forcats_0.5.2
## [35] stringr_1.4.1 dplyr_1.0.10
## [37] purrr_0.3.5 readr_2.1.3
## [39] tidyr_1.2.1 tibble_3.1.8
## [41] tidyverse_1.3.2 ggpubr_0.5.0
## [43] ggplot2_3.4.0 SeuratObject_4.1.3
## [45] Seurat_4.2.1 Signac_1.8.0
## [47] BiocStyle_2.24.0
##
## loaded via a namespace (and not attached):
## [1] Hmisc_4.7-1 ica_1.0-3
## [3] svglite_2.1.0 RcppRoll_0.3.0
## [5] Rsamtools_2.12.0 lmtest_0.9-40
## [7] rprojroot_2.0.3 crayon_1.5.2
## [9] MASS_7.3-58.1 nlme_3.1-160
## [11] backports_1.4.1 reprex_2.0.2
## [13] rlang_1.0.6 ROCR_1.0-11
## [15] readxl_1.4.1 irlba_2.3.5.1
## [17] limma_3.52.4 filelock_1.0.2
## [19] BiocParallel_1.30.4 rjson_0.2.21
## [21] bit64_4.0.5 glue_1.6.2
## [23] sctransform_0.3.5 parallel_4.2.1
## [25] spatstat.sparse_3.0-0 vsn_3.64.0
## [27] spatstat.geom_3.0-3 haven_2.5.1
## [29] tidyselect_1.2.0 SummarizedExperiment_1.28.0
## [31] fitdistrplus_1.1-8 XML_3.99-0.12
## [33] zoo_1.8-11 GenomicAlignments_1.32.1
## [35] xtable_1.8-4 magrittr_2.0.3
## [37] evaluate_0.18 cli_3.4.1
## [39] zlibbioc_1.44.0 rstudioapi_0.14
## [41] miniUI_0.1.1.1 sp_1.5-1
## [43] bslib_0.4.1 rpart_4.1.19
## [45] fastmatch_1.1-3 shiny_1.7.3
## [47] xfun_0.35 cluster_2.1.4
## [49] caTools_1.18.2 KEGGREST_1.38.0
## [51] ggrepel_0.9.2 listenv_0.8.0
## [53] png_0.1-7 future_1.29.0
## [55] withr_2.5.0 ggforce_0.4.1
## [57] bitops_1.0-7 cellranger_1.1.0
## [59] pillar_1.8.1 gplots_3.1.3
## [61] cachem_1.0.6 fs_1.5.2
## [63] vctrs_0.5.1 ellipsis_0.3.2
## [65] generics_0.1.3 tools_4.2.1
## [67] foreign_0.8-83 tweenr_2.0.2
## [69] munsell_0.5.0 DelayedArray_0.24.0
## [71] fastmap_1.1.0 compiler_4.2.1
## [73] abind_1.4-5 httpuv_1.6.6
## [75] Ringo_1.60.0 plotly_4.10.1
## [77] rgeos_0.5-9 GenomeInfoDbData_1.2.9
## [79] gridExtra_2.3 DNAcopy_1.70.0
## [81] edgeR_3.38.4 lattice_0.20-45
## [83] deldir_1.0-6 utf8_1.2.2
## [85] later_1.3.0 BiocFileCache_2.6.0
## [87] jsonlite_1.8.3 affy_1.74.0
## [89] scales_1.2.1 docopt_0.7.1
## [91] pbapply_1.6-0 carData_3.0-5
## [93] genefilter_1.78.0 lazyeval_0.2.2
## [95] promises_1.2.0.1 car_3.1-1
## [97] latticeExtra_0.6-30 goftest_1.2-3
## [99] spatstat.utils_3.0-1 reticulate_1.26
## [101] checkmate_2.1.0 rmarkdown_2.18
## [103] cowplot_1.1.1 webshot_0.5.4
## [105] Rtsne_0.16 dichromat_2.0-0.1
## [107] uwot_0.1.14 igraph_1.3.5
## [109] survival_3.4-0 yaml_2.3.6
## [111] systemfonts_1.0.4 htmltools_0.5.3
## [113] memoise_2.0.1 VariantAnnotation_1.42.1
## [115] BiocIO_1.6.0 locfit_1.5-9.6
## [117] here_1.0.1 gsmoothr_0.1.7
## [119] viridisLite_0.4.1 digest_0.6.30
## [121] assertthat_0.2.1 mime_0.12
## [123] rappdirs_0.3.3 RSQLite_2.2.18
## [125] future.apply_1.10.0 blob_1.2.3
## [127] preprocessCore_1.58.0 splines_4.2.1
## [129] Formula_1.2-4 labeling_0.4.2
## [131] googledrive_2.0.0 ProtGenerics_1.28.0
## [133] RCurl_1.98-1.9 broom_1.0.1
## [135] hms_1.1.2 modelr_0.1.10
## [137] colorspace_2.0-3 base64enc_0.1-3
## [139] BiocManager_1.30.19 nnet_7.3-18
## [141] sass_0.4.2 bookdown_0.30
## [143] RANN_2.6.1 fansi_1.0.3
## [145] tzdb_0.3.0 truncnorm_1.0-8
## [147] parallelly_1.32.1 R6_2.5.1
## [149] grid_4.2.1 ggridges_0.5.4
## [151] lifecycle_1.0.3 curl_4.3.3
## [153] ggsignif_0.6.4 googlesheets4_1.0.1
## [155] affyio_1.66.0 leiden_0.4.3
## [157] jquerylib_0.1.4 RcppAnnoy_0.0.20
## [159] spatstat.explore_3.0-5 htmlwidgets_1.5.4
## [161] polyclip_1.10-4 biomaRt_2.52.0
## [163] crosstalk_1.2.0 timechange_0.1.1
## [165] rvest_1.0.3 globals_0.16.1
## [167] htmlTable_2.4.1 spatstat.random_3.0-1
## [169] progressr_0.11.0 codetools_0.2-18
## [171] matrixStats_0.62.0 lubridate_1.9.0
## [173] gtools_3.9.3 prettyunits_1.1.1
## [175] dbplyr_2.2.1 gtable_0.3.1
## [177] DBI_1.1.3 highr_0.9
## [179] tensor_1.5 httr_1.4.4
## [181] KernSmooth_2.23-20 stringi_1.7.8
## [183] vroom_1.6.0 progress_1.2.2
## [185] farver_2.1.1 annotate_1.74.0
## [187] DT_0.26 xml2_1.3.3
## [189] restfulr_0.0.15 interp_1.1-3
## [191] scattermore_0.8 bit_4.0.5
## [193] jpeg_0.1-9 MatrixGenerics_1.10.0
## [195] spatstat.data_3.0-0 pkgconfig_2.0.3
## [197] gargle_1.2.1 rstatix_0.7.1
## [199] Rsolnp_1.16